Locality Analysis for Distributed Shared-Memory Multiprocessors
نویسندگان
چکیده
This paper studies the locality analysis problem for shared-memory multiprocessors, a class of parallel machines that has experienced steady and rapid growth in the past few years. The focus of this work is on estimation of the memory performance of a loop nest for a given set of computation and data distributions. We assume a distributed shared-memory multiprocessor model. We discuss how to estimate the total number of cache misses (compulsory misses, connict misses, capacity misses), and also the fractions of these cache misses that result in local vs. remote memory accesses. The goal of our work is to use this performance estimation to guide automatic and semi-automatic selection of data distributions and loop transformations in programs written for future shared-memory multiprocessors. This paper also includes simulation results as validation of our analysis method.
منابع مشابه
Locality Information Based Scheduling in Shared Memory Multiprocessors
Lightweight threads have become a common abstraction in the field of programming languages and operating systems. This paper examines the performance implications of locality information usage in thread scheduling algorithms for scal-able shared-memory multiprocessors. The elements of a distributed scheduler using all available locality information as well as experimental measurements are prese...
متن کاملEnhancing the Performance of Autoscheduling with Locality-Based Partitioning in Distributed Shared Memory Multiprocessors
Abstract. Autoscheduling is a parallel program compilation and execution model that combines uniquely three features: Automatic extraction of loop and functional parallelism at any level of granularity, dynamic scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors. This paper presents a technique that enhances the performance of autosche...
متن کاملAccess Descriptor Based Locality Analysis for Distributed-Shared Memory Multiprocessors
Most of today’s multiprocessors have a DistributedShared Memory (DSM) organization, which enables scalability while retaining the convenience of the shared-memory programming paradigm. Data locality is crucial for performance in DSM machines, due to the difference in access times between local and remote memories. In this paper, we present a compile-time representation that captures the memory ...
متن کاملBidirectional Ring: An Alternative to the Hierarchy of Unidirectional Rings
A hierarchy of unidirectional rings has been used successfully in distributed shared-memory multiprocessors. The xed cluster size of the hierarchy prevents full exploitation of communication locality. The bidirectional ring is presented as an alternative to the hierarchy. Its relative performance is evaluated for a variety of memory access patterns and network sizes. It gives superior performan...
متن کاملEnhancing the Performance of Autoscheduling in Distributed Shared Memory Multiprocessors
Abstract. Autoscheduling is a parallel program compilation and execution model that combines uniquely three features: Automatic extraction of loop and functional parallelism at any level of granularity, dynamic scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors. This paper presents a technique that enhances the performance of autosche...
متن کامل